Microsoft Word - paper39.docx
نویسندگان
چکیده
In order to further enhance the accuracy of Web information extraction, and overcome the shortcomings of the Hidden Markov Model (HMM) and its hybrid method in parameter optimization, a novel Web extraction algorithm based on a combined and improved particle swarm optimization, ant colony algorithm (IPSO-ACA) and HMM is presented. First, an HMM for information extraction is built. Second, an improved hybrid intelligent algorithm combining PSO with ACA is proposed. In the new algorithm, inertial weights of particle swarm optimization and parameters of ant colony algorithm such as stimulating factor, volatilization coefficients and pheromones are all improved adaptively, and then the fitness function values of particles’ history optimal solutions are used to adjust the initial pheromone distribution of the ant colony algorithm. Third, the hybrid intelligent algorithm is adopted for the approximate global optimal solution and then Baum-Welch algorithm (BW) is adopted for the local modification, which not only solves the BW dependency on initial values and the trapped local optimum problem, but also makes full use of the global search ability of the hybrid intelligent algorithm and local development ability of BW. Finally, the Viterbi algorithm is used to decode the HMM model. Compared with existing HMM optimization methods, the comprehensive Fβ=1 value is averagely increased by 7.3%, which shows that the improved algorithm can effectively enhance optimization performance and extraction accuracy.